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(54) Secure wireless communication user identification by voice recognition 



(57) A method to authorize or authenticate a user ot 
a wireless telecommunication system (32), and includes 
steps of (a) selecting a word at random from a set of 
reference words, or synthesizing a reference word; (b) 
prompting the user to speak the reference word; and (c) 
authenticating the user to operate in or through or with 
a resource reachable through the wireless telecommu- 
nication system, only if the user's speech characteristics 
match p re-stored characteristics associated with the ref- 
erence word. In one embodiment the steps of selecting 
or synthesizing, prompting and authenticating are per- 
formed in a mobile station (10) having a speech trans- 



ducer (19) for inputting the user's speech, while in an- 
other embodiment at least one of the steps of selecting 
or synthesizing, prompting and authenticating are per- 
formed in a wireless telecommunications network (32) 
that is coupled between the mobile station and a tele- 
phone network (35). In yet another embodiment at least 
one of these steps are performed in a data communica- 
tions network resource (38) that is coupled through a 
data communications network (37), such as the Internet, 
and the wireless telecommunications network to the mo- 
bile station. The step of prompting may include a step 
of displaying alphanumeric text and/or a graphical im- 
age to the user using a display (20) of the mobile station. ' 
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Description 

[0001] This invention relates generally to biometric 
systems and methods and, in particular, to systems that 
identify a speaker by the automatic recognition of the 
speaker's voice and, more particularly, to a wireless tel- 
ecommunications system employing voice recognition. 
[0002] Biometric systems typically employ and meas- 
ure some physical characteristic of a particular individ- 
ual to uniquely identify that individual. The characteristic 
could be, by example, a fingerprint, a retinal pattern, or 
a voice pattern. The use of this latter characteristic is 
especially attractive for those systems that already in- 
clude a microphone, such as telecommunications sys- 
tem, as no hardware expense may need to be incurred 
in order to implement the identification system. After 
having uniquely identified a speaker as being a particu- 
lar, authorized individual, the system can then grant the 
speaker access to some location or to some resource. 
That is, this type of biometric system can be viewed as 
an electronic, voice actuated lock. 
[0003] One problem that arises in many such systems 
is that the system is trained to recognize a particular 
speaker using a limited set of spoken words. For exam- 
ple, the speaker may be expected to say his or her 
name, and/or some predetermined password. While this 
approach may be suitable for many applications, in oth- 
er applications the limited set of words used for identifi- 
cation may not be desirable, and may in fact lead some 
other persons to attempt to defeat the voice recognition- 
based biometric system. For example, a person at- 
tempting to defeat the system may simply surreptitiously 
tape record a person speaking the word or words that 
the biometric system expects to be spoken, and then 
play back the authorized person's speech to the voice 
input transducer of the biometric system. 
It is well known in the mobile telecommunications art to 
provide a mobile telephone, such as a vehicle-installed 
cellular telephone, with a voice recognition capability in 
order to replace or augment the normal user input de- 
vice^). For example, the user can dial a number by 
speaking the digits, or by speaking a name having a 
stored telephone number. Some commands could be 
given to the telephone in the same manner. 
[0004] In general, current user identification methods 
are based on measuring one static feature: e.g., a writ- 
ten password, a spoken password (voice recognition), 
a fingerprint, an image of the eye and so on. In the iden- 
tifying situation the user knows what is measured and 
how. 

[0005] It is an object of this invention to provide an 
improved biometric system, in particular a voice actuat- 
ed recognition system, that relies on a random set of 
words and or images. 

[0006] It is a further object of this invention to provide 
a mobile station having a speech transducer, and a 
method and apparatus to authenticate or authorize a us- 
er of a wireless telecommunication system to operate 



in, or through, or with a resource reachable through the 
wireless telecommunication system, only if the user's 
speech characteristics match pre-stored characteristics 
associated with word selected randomly from a training 
5 set of words. 

[0007] The foregoing and other problems are over- 
come and the objects of the invention are realized by 
methods and apparatus in accordance with embodi- 
ments of this invention. 
10 [0008] According to this invention, when a user enters 
an identifying situation he or she does not know before- 
hand what the identification stimulus will be and, thus, 
what the user's reaction or response will be. Using cur- 
rent technology a most straightforward way to imple- 
15 ment the invention is with voice recognition. In this case 
the user is presented with a voice stimulus, or a text 
stimulus, or a graphical image stimulus, and the user 
reacts with his or her voice. The stimulus can be direct 
(e.g., the user speaks a displayed word) or indirect (e. 
g., the user responds to a question that only the user 
knows the answer to). Since even the correct user does 
not know beforehand the details of the identification sit- 
uation, it becomes very difficult or impossible to know 
beforehand what the expected correct response will be. 
[0009] A method is disclosed to authorize or authen- 
ticate a user of a wireless telecommunication system, 
and includes steps of (a) selecting a word at random 
from a set of reference words, or synthesizing a random 
reference word; (b) prompting the user to speak the ref- 
erence word; and (c) authenticating the user to operate 
in or through or with a resource reachable through the 
wireless telecommunication system, only if the user's 
speech characteristics match predetermined character- 
istics associated with the reference word. 
[0010] In one embodiment the steps of selecting or 
synthesizing, prompting and authenticating are per- 
formed in a mobile station having a speech transducer 
for inputting the user's speech, while in another embod- 
iment at least one of the steps of selecting or synthesiz- 
ing, prompting and authenticating are performed in a 
wireless telecommunications network that is coupled 
between the mobile station and a telephone network. In 
yet another embodiment at least one of the steps of se- 
lecting or synthesizing, prompting and authenticating 
are performed in a data communications network re- 
source that is coupled through a data communications 
network, such as the Internet, and the wireless telecom- 
munications network to the mobile station. 
[0011] The step of prompting may include a step of 
displaying alphanumeric text and/or a graphical image 
to the user using a display of the mobile station. 
[0012] The above set forth and other features of the 
invention are made more apparent in the ensuing De- 
tailed Description of the Invention when read in conjunc- 
tion with the attached Drawings, wherein: 

Fig. 1 is a block diagram of a mobile station that is 
constructed and operated in accordance with this 
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invention; 

Fig. 2 is an elevational view of the mobile station 
shown in Fig. T, and which further illustrates a cel- 
lular communication system to which the mobile 
station is bidirectional ly coupled through wireless 
RF links; and 

Fig. 3 is block diagram that shows in greater detail 
a plurality of data communications network resourc- 
es in accordance with further embodiments of this 
invention. 

[0013] Reference is made to Figs. 1 and 2 for illustrat- 
ing a wireless user terminal or mobile station 10, such 
as but not limited to a cellular radiotelephone or a per- 
sonal communicator, that is suitable for practicing this 
invention. The mobile station 10 includes an antenna 12 
for transmitting signals to and for receiving signals from 
a base site or base station 30. The base station 30 is a 
part of a wireless telecommunications network or sys- 
tem 32, that may include a mobile switching center 
(MSC) 34. The MSC 34 provides a connection to land- 
line trunks, such as the public switched telephone net- 
work (PSTN) 35, when the mobile station 10 is involved 
in a call. 

[0014] The mobile station includes a modulator 
(MOD) 14A, a transmitter 14, a receiver 16, a demodu- 
lator (DEMOD) 16A, and a controller 18 that provides 
signals to and receives signals from the transmitter 14 
and receiver 1 6, respectively. These signals include sig- 
nalling information in accordance with the air interface 
standard of the applicable cellular system, and also user 
speech and/or user generated data. The particular air 
interface standard and/or access type is not germane to 
the operation of this system, as mobile stations and 
wireless systems employing most if not all air interface 
standards and access types (e.g., TDMA, CDMA, FD- 
MA, etc.) can benefit from the teachings of this inven- 
tion. 

[0015] It is understood that the controller 18 also in- 
cludes the circuitry required for implementing the audio 
and logic functions of the mobile station. By example, 
the controller 18 may be comprised of a digital signal 
processor device, a microprocessor device, and various 
analog to digital converters, digital to analog converters, 
and other support circuits. The control and signal 
processing functions of the mobile station 10 are allo- 
cated between these devices according to their respec- 
tive capabilities. In many embodiments the mobile sta- 
tion 10 will include a voice encoder/decoder (vocoder) 
18A of any suitable type. 

[0016] A user interface includes a conventional ear- 
phone or speaker 17, a conventional microphone 19, a 
display 20, and a user input device, typically a keypad 
22, all of which are coupled to the controller 1 8. The key- 
pad 22 includes the conventional numeric (0-9) and re- 
lated keys (#,*) 22a, and other keys 22b used for oper- 



ating the mobile station 10. These other keys 22b may 
include, by example, a SEND key, various menu scroll- 
ing and soft keys, and a PWR key The mobile station 
10 also includes a battery 26 for powering the various 

5 circuits that are required to operate the mobile station. 
The mobile station 10 also includes various memories, 
shown collectively as the memory 24, wherein are 
stored a plurality of constants and variables that are 
used by the controller 1 8 during the operation of the mo- 

10 bile station. The memory 24 may also store all or some 
of the values of various wireless system parameters and 
the number assignment module (NAM). An operating 
program for controlling the operation of controller 18 is 
also stored in the memory 24 (typically in a ROM de- 

'5 vice). 

[001 7] In accordance with the teachings of this inven- 
tion, the controller 18 includes a speech recognition 
function (SRF) 29 that receives digitized input that orig- 
inates from the microphone 1 9, and which is capable of 

20 processing the digitized input and for comparing the 
characteristics of the user's speech with p re-stored 
characteristics stored in the memory 24. If a match oc- 
curs then the controller 18 is operable to grant the 
speaker access to some resource, for example to a re- 

25 movable electronic card 28 which authorizes or enables 
the speaker to, in a typical application, make a tele- 
phone call from the mobile station 10. For example, the 
subscriber data required to make a telephone call, such 
as the Mobile Identification Number (MIN), and/or some 

30 authentication-related key or other data, can be stored 
in the card 28, and access to this information is only 
granted when the user speaks a word or words that are 
expected by the SRF 29, and which match predeter- 
mined enrollment (training) data already stored in the 

35 memory 24. 

[0018] Further in accordance with this invention, the 
training data could as well be stored in some other mem- 
ory, such as a memory 28A within the card 28, or in a 
memory 32A located in the system 32, or in some re- 

40 mote memory that is accessible through the system 32. 
For example, and referring specifically to Fig. 2, a mem- 
ory 39 storing the training data set could be located in 
a data communications network (e.g., the Internet) en- 
tity or resource 38, which is accessible from the PSTN 

45 35 through a network interface 36 (e.g., an Internet 
Service Provider or ISP), and a local area or wide area 
data communications network 37 (e.g., the Internet). In 
this case it can be appreciated that at least some of the 
data is packettzed and sent in TCP/IP format. 

so [0019] In general, the identification system and soft- 
ware, as well as the prestored speech samples and 
characteristics may be located in the mobile station 1 0, 
in a server of the network 37 or the system 32, or in the 
system of a service provider. 

55 [0020] In accordance with the an aspect of this inven- 
tion the user can be prompted to speak one or a set of 
words, with the specific word to be spoken being select- 
ed randomly from the set of known words by the SRF 
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29. Assuming that the set of known words has a non- 
trivial number of elements, then it becomes difficult for 
another person to defeat the SRF 29 by recording a 
word or words expected to be spoken by the user. 
[0021] The user can be prompted to speak the select- 
ed word or words in various ways. In a simplest way the 
SRF 29 displays the selected word on the display 20. 
Alternatively, the SRF 29 can use a speech synthesizer 
and the mobile station's speaker 17 to audibly prompt 
the user for the word to be spoken. In another embodi- 
ment the display 20 is used to present some graphical 
image corresponding to a word to be spoken (e.g., a 
tree). In a further embodiment some generic graphical 
image is used to suggest to the user a predetermined 
word to be spoken, and that was previously agreed upon 
during the training or enrollment stage. For example, it 
can be agreed upon that when presented with the graph- 
ical image of a tree the user will speak the word "birch", 
and that when presented with a graphical image of a city 
skyline the user will speak the word "Chicago". In this 
latter embodiment, and even if an unauthorized person , 
where to gain possession of the user's mobile station 
10, it is unlikely that the unauthorized person will give 
the correct reply word when presented with a particular 
graphical image or icon, let alone speak the reply word 
in a manner that would be recognized by the SRF 29 as 
a valid response. 

[0022] If the set of training words are stored in the mo- 
bile station 10, whether in the memory 24 or the card 
28, the words can be encrypted to prevent unauthorized 
access and/or modification. 

[0023] Referring to Fig. 3, it can also be appreciated 
that the SRF 29 can be resident outside of the mobile 
station 10, such as at one or more network entities or 
resources 38A-38D (e.g., a credit card supplier, stock 
broker, retailer or bank.) In this embodiment, and as- 
suming for example that the user wishes to access his 
account at the bank 38D, the SRF 29 signals back to 
the mobile station 10 a randomly selected word to be 
spoken by the user, via the network 37, network inter- 
face 36, and wireless system 32. The user speaks the 
word and, in one embodiment, the spectral and temporal 
characteristics of the user's utterance are transmitted 
from the mobile station 10 as a digital data stream (not 
as speech per se) to the SRF 29 of the bank 38D for 
processing and comparison. In another embodiment the 
user's spoken utterance is transmitted in a normal man- 
ner, such as by transmitting voice encoder/decoder 
(vocoder 18A) parameters, which are converted to 
speech in the system 32. This speech is then routed to 
the SRF 29 of the bank 38D for processing and compar- 
ison. It should be noted that the spectral and temporal 
and characteristics transmitted in the first embodiment 
could be the vocoder 18A output parameters as well, 
which are then transmitted on further to the SRF 29 of 
the band 38 D, without being first converted to a speech 
signal in the system 32. In this case the necessary sig- 
nalling protocol must first be defined and established so 



that the system 32 knows to bypass its speech decoder. 
[0024] It is also within the scope of the teaching of this 
invention to provide a centralized SRF 29A, whose re- 
sponsibility it is to authenticate users for other locations. 

5 For example, assume that the user of the mobile station 
10 telephones the bank 38D and wishes to access an 
account. In this case the user authentication process is 
handled by the intervention of the SRF 29A which has 
a database (DB) 29B of recognition word sets and as- 

10 sociated speech characteristics for a plurality of different, 
users. The SRF 29 A, after processing the user's speech 
signal, signals the bank 38D that the user is either au- 
thorized or is not authorized. This process could be han- 
dled in several ways, such as by connecting the user's 

75 call directly to the SRF 29A, or by forwarding the user's 
voice characteristics from the bank 38D to the SRF 29A. 
In either case the bank 38D is not required to have the 
SRF 29, nor are the other network resources 38A-38C. 
[0025] It should be noted that the set of recognition 

20 words stored in the DB 29B could be different for every 
user. It should be further noted that this process implies 
that at some time the user interacts with the SRFs 29, 
or just with the SRF 29A, in order to execute an enroll- 
ment or training process whereby the user's database 

25 entries (set of recognition words and the associated 
speech temporal and spectral characteristics) are cre- 
ated. As was noted above, at least some of these 
speech characteristics could be based on or include 
voice encoder 1 8A parameters. 

30 [0026] As an exemplary embodiment of this invention 
about 20-50 prestored voice samples can be used, and 
the stimulus and the sample are randomly or pseudor- 
andomly selected among these (e.g., text-dependent 
speaker verification). In that the user records the sam- 

35 pies himself or herself, the connection between the stim- 
ulus and the sample may be meaningful only for the us- 
er. Also, due to the provided stimulus the user is not re- 
quired to memorize one or more passwords or numeric 
codes. Furthermore, there can be different sets of sam- 

40 pies for different network services. For example, one set 
of samples may be used to obtain access to a network 
e-mail facility, while another set of samples may be used 
to obtain access to a network voice mail facility. As em- 
ployed herein the term "random" is considered to en- 

45 compass both truly random as well as pseudorandom. 
[0027] For the case where speech synthesizing tech- 
niques improve sufficiently, it is also possible that the 
prestored samples are not required, but instead the sys- 
tem creates one or more synthesized reference word(s) 

50 that are compared to the user's voice response (text- 
independent speaker verification). The generated refer- 
ence word is preferably generated randomly or pseu- 
dorandomly. 

[0028] Furthermore, it should be appreciated that the 
55 teachings of this invention could be combined with the 
use of one or more other types of identification systems 
and techniques, such as fingerprint identification. Also, 
various ones of the stimulus types described above 
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could be used in combination. For example, the user 
may be presented with a randomly selected or generat- 
ed alphanumeric string that the user is expected vocal- 
ize, as well as with a related or totally unrelated graph- 
ical image to which the user is expected to verbally re- 
spond. 

[0029] While the invention has been described in the 
context of preferred and exemplary embodiments, it 
should be realized that a number of modifications to 
these teachings may occur to one skilled in the art. By 
example, any suitable speech processing techniques 
that are known for use in speech recognition systems 
can be employed, and the teachings of this invention 
are not limited for use to any specific technique. 
[0030] Furthermore, while the user may be prompted 
to speak a reference "word", it can be appreciated that 
the "word" may actually be a phrase comprised of a plu- 
rality of words and also possibly numbers (e.g., a date, 
or an address). 

[0031] Thus, while the invention has been particularly 
shown and described with respect to preferred embod- 
iments thereof, it will be understood by those skilled in 
the art that changes in form and details may be made 
therein without departing from the scope and spirit of the 
invention. 



Claims 

1 . A method to authenticate a user of a wireless tele- 
communication system, comprising steps of: 

selecting a word at random from a set of refer- 
ence words; 

prompting the user to speak the selected word; 
and 

authenticating the user to operate in or through 
or with a resource reachable through the wire- 
less telecommunication system, only if the us- 
er's speech characteristics match pre-stored 
characteristics associated with the selected 
word. 

2. A method as in claim 1 , wherein the steps of select- 
ing, prompting and authenticating are performed in 
a mobile station having a speech transducer for in- 
putting the user's speech. 

3. A method as in claim 1 , wherein at least one of the 
steps of selecting, prompting and authenticating are 
performed in a wireless telecommunications net- 
work coupled between a mobile station having a 
speech transducer for inputting the user's speech 
and a telephone network. 

4. A method as in claim 1 , wherein at least one of the 



steps of selecting, prompting and authenticating are 
performed in a data communications network re- 
source that is coupled through a data communica- 
tions network and a wireless telecommunications 
s network to a mobile station having a speech trans- 
ducer for inputting the user's speech. 

5. A method as in claim 4, wherein the data commu- 
nications network is comprised of the Internet. 

10 

6. A method as in claim 1 , wherein the step of prompt- 
ing includes a step of displaying alphanumeric text 
to the user using a display of a mobile station having 
a speech transducer for inputting the user's speech. 

15 

7. A method as in claim 1 , wherein the step of prompt- 
ing includes a step of displaying a graphical image 
to the user using a display of a mobile station having 
a speech transducer for inputting the user's speech. 

20 

8. A method to authenticate a user of a wireless tele- 
communication system, comprising steps of: 

synthesizing a random reference word; 

25 

prompting the user to speak the synthesized 
reference word; and 

authenticating the user to operate in or through 
30 or with a resource reachable through the wire- 

less telecommunication system, only if the us- 
er's speech characteristics match characteris- 
tics associated with the synthesized reference 
word. 

35 

9. A method as in claim 8, wherein the steps of syn- 
thesizing, prompting and authenticating are per- 
formed in a mobile station having a speech trans- 
ducer for inputting the user's speech. 

40 

10. A method as in claim 8, wherein at least one of the 
steps of synthesizing, prompting and authenticating 
are performed in a wireless telecommunications 
network coupled between a mobile station having a 

45 speech transducer for inputting the user's speech 
and a telephone network. 

11. A method as in claim 8, wherein at least one of the 
steps of synthesizing, prompting and authenticating 

so are performed in a data communications network 
resource that is coupled through a data communi- 
cations network and a wireless telecommunications 
network to a mobile station having a speech trans- 
ducer for inputting the user's speech. 

55 

12. A method as in claim 11, wherein the data commu- 
nications network is comprised of the Internet. 
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1 3. A method as in claim 8, wherein the step of prompt- 
ing includes a step of displaying alphanumeric text 
to the user using a display of a mobile station having 
a speech transducer for inputting the user's speech. 

14. A method as in claim 8, wherein the step of prompt- 
ing includes a step of displaying a graphical image 
to the user using a display of a mobile station having 
a speech transducer for inputting the user's speech. 

15. A wireless telecommunication system, comprising: 

at least one base station; 



voice digitizing and processing system to said base 
station, and wherein at least said second subsys- 
tem is located external to said mobile station. 

s 20. A system as in claim 1 5 ( wherein at least said sec- 
ond subsystem is located in a network entity that is 
coupled to a data communications network that is 
bidirectionally coupled to said system. 



a least one mobile station comprising a trans- *5 
ceiverfor conducting wireless communications 
with said base station, said mobile station fur- 
ther comprising a user interface and a micro- 
phone for inputting a users's speech; 

20 

a first subsystem coupled to said user interface 
for prompting the user to speak a reference 
word that is randomly selected from a set of ref- 
erence words, or that is randomly generated; 
and 25 



a second subsystem coupled to said micro- 
phone for authenticating the mobile station to 
operate in the wireless telecommunications 
system, or through the wireless telecommuni- so 
cations system, or with a resource that is reach- 
able through the wireless telecommunication 
system, only if the user's speech characteris- 
tics match expected characteristics associated 
with the reference word. 35 



16. A system as in claim 1 5, wherein one or both of the 
first and second subsystems are located in one of 
the mobile station, in the base station or in a con- 
troller coupled to the base station, or in a data com- 40 
munications network entity that is coupled through 

a data communications network to the wireless tel- 
ecommunications system. 

17. A system as in claim 16, wherein the data commu- 45 
nications network is comprised of the Internet. 

1 8. A system as in claim 1 5, wherein the first subsystem 
employs said user interface to at least one of 
present alphanumeric text to the user using a dis- so 
play of said mobile station, or to present a graphical 
image to the user using said display of said mobile 
station. 



19. A system as in claim 1 5, wherein said mobile station & 
further comprises a voice digitizing and processing 
system, and wherein said first subsystem further 
comprises means for transmitting an output of said 
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